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(57) ABSTRACT 

An improved directory-based, hierarchical memory system 
is disclosed that is capable of simultaneously processing 
multiple ownership requests initiated by a processor that is 
coupled to the memory. An ownership request is initiated on 
behalf of a processor to obtain an exclusive copy of memory 
data that may then be modified by the processor. Id the data 
processing system of the preferred embodiment, multiple 
processors are each coupled to a respective cache memory. 
These cache memories are further coupled to a hierarchical 
memory structure including a main memory and one or more 
additional intermediate levels of cache memory. As is known 
in the art, copies of addressable portions of the main 
memory may reside in one or more of the cache memories 
within the hierarchical memory system. A memory directory 
records the location and status of each addressable portion of 
memory so that coherency may be maintained. Prior to 
updating an addressable portion of memory in a respectively 
coupled cache, a processor must acquire an exclusively 
"owned" copy of the requested memory portion from the 
hierarchical memory. This is accomplished by issuing a 
request for ownership to the hierarchical memory. Return of 
ownership may impose memory latency for write requests. 
To reduce this latency, the current invention allows multiple 
requests for ownership to be initiated by a processor simul- 
taneously. In the preferred embodiment, write request logic 
receives two pending write requests from a processor. For 
each request that is associated with an addressable memory 
location that is not yet owned by the processor, an associated 
ownership request is issued to the hierarchical memory. The 
requests are not processed in the respective cache memory 
until after the associated ownership grant is returned from 
the hierarchical memory system. Because ownership is not 
necessarily granted by the hierarchical memory in the order 
ownership requests are issued, control logic is provided to 
ensure that a local cache processes all write requests in 
time-order so that memory consistency is maintained. 
According to another aspect of the invention, read request 
logic is provided to allow a memory read request to by-pass 
all pending write requests previously issued by the same 
processor. In this manner, read operations are not affected by 
delays associated with ownership requests. 

20 Claims, 7 Drawing Sheets 
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CACHE CONTROL SYSTEM FOR subsequently modified, in a cache memory, another IP 

PERFORMING MULTIPLE OUTSTANDING requesting access to the same data item must be prevented 

OWNERSHIP REQUESTS fr° m using the older copy of the data item stored either in 

main memory or the requesting IP's cache. This is referred 

Cross-Reference to Other Applications and Issued 5 to as maintaining cache coherency. Maintaining cache 

p alent coherency becomes more diflBcult as more caches are added 

to the system since more copies of a single data item may 

The following co-pending applications of common have to be tracked, 

assignee contain some common disclosure: Many methods exist to maintain cache coherency. Some 

"A Directory-Based Cache Coherency System", filed a0 earlier systems achieve coherency by implementing memory 

Nov. 5, 1997, Ser. No. 08/965,004, incorporated herein locks. That is, if an updated copy of data exists within a local 

by reference in its entirety; cache, other processors are prohibited from obtaining a copy 

"Message Flow Protocol for Avoiding Deadlocks", U.S. of lhe data from main memory until the updated copy is 

Pat. No. 6,014,709, issued Jan. 11, 2001, incorporated rcturned *° main memory, thereby releasing the lock. For 

herein by reference in its entirety; 15 com P lex svstems > the add ^onal hardware and/or operating 

«„• . „ . w _ ,t ■ r w w time required for setting and releasing the locks within main 

"High-Speed Memory Storage Unit for a Multiprocessor m canQot be ^ Furthermore> reliance on ^ 

System Having Integrated Directory and Data ^S orage / ^ ^ of a Ucations ^ „ 

Subsystems" filed Dec. 31 1997, Ser. No. 09/001,588, proc essing. 

incorporated herein by reference in its entirety; and ,i_ J c • , • • u u * i_ 

r J 20 Another method of maintainmg cache coherency is shown 

"Directory-Based Cache Coherency System Supporting m v$ Pat No 4j843>5 4 2 t0 Dashiell et aL> and in 

Multiple Instruction Processor and Input/Output v s Pat No 4,755,930 t0 WiIson> Jr> et aL 

Caches", filed Dec. 31, 1997, Ser. No. 09/001,598, patents discuss a system wherem each processor has a local 

incorporated herem by reference in its entirety; and cache coupled {Q a shafed memofy thrQUgh a common 

"Directory-Based Cache Coherency System Supporting 2$ memory bus. Each processor is responsible for monitoring, 

Multiple Instruction Processor and Input/Output or "snooping", the common bus to maintain currency of its 

Caches", a Divisional of Ser. No. 091001,598, filed own cac h e data. These snooping protocols increase proces- 

Aug. 24, 2000, Ser. No. 09/645,233, incorporated so r overhead, and are unworkable in hierarchical memory 

herein by reference in its entirety. configurations that do not have a common bus structure. A 

30 similar snooping protocol is shown in U.S. Pat. No. 5,025, 

BACKGROUND OF THE INVENTION 365 t0 Mathur et aL> which teaches local caches that monitor 

1 Field of the Invention a svstem bus for the occurrence of memory accesses which 

' . „ , would invalidate a local copy of data. The Mathur snooping 

This mvent.on relates generally to an improved system ^ removes ^ of ovcrhead associa(ed ^ 

and method for mainta.mng cache coherency in a data * dala wilh in the local caches at times 

processing system in which multiple processors are coupled ^ ^ Qc howevef ^ Mathur 

to a directory-based, b.erarchica shared memory; and more .. ^ unworkable in mem * , ems , 

particularly, relates to a system that allows one or more of J , 

\ J . J , common bus structure. 

the processors to each have multiple ownership requests . . . , c . . . . , , , 

. , , , . , 1 j s . i_ Another method of maintaining cache coherency is shown 

simultaneously pending to the shared memory, wherein each . „ o n . k ca^ . ^ u- -n, .u a 

c . i_* . . . • 1 • 40 in U.S. Pat. No. 5,423,016 to Tsuchiya. The method 

of the ownership requests is a request to gain exclusive w " ' ... 3 

. n M „„„ t „j „^„„„i,l 2„r t ;™ described in this patent involves providing a memory struc- 

access to a requested, addressable portion 01 the memory. „ , ... ^ ^ * , . ^ 

... ture called a duplicate tag with each cache memory. The 

2. Description of the Pnor Art duplicate tags record which data items are stored within the 

Data processing systems are becoming increasing com- associated cache. When a data item is modified by a 

plex. Some systems, such as Symmetric Multi-Processor 45 processor, an invalidation request is routed to all of the other 

(SMP) computer systems, couple two or more Instruction duplicate tags in the system. The duplicate tags are searched 

Processors (IPs) and multiple Input/Output (I/O) Modules to f or the address of the referenced data item. If found, the data 

shared memory. This allows the multiple IPs to operate i tem mar ked as invalid in the other caches. Such an 

simultaneously on the same task, and also allows multiple approach is impractical for distributed systems having many 

tasks to be performed at the same time to increase system 50 caches interconnected in a hierarchical fashion because the 

throughput. time required to route the invalidation requests poses an 

As the number of units coupled to a shared memory undue overhead, 

increases, more demands are placed on the memory and for distributed systems having hierarchical memory 

memory latency increases. To address this problem, high structures, a directory-based coherency system becomes 

speed cache memory systems are often coupled to one or 55 more practical. Directory-based coherency systems utilize a 

more of the IPs for storing data signals that are copied from centralized directory to record the location and the status of 

main memory. These cache memories are generally capable data as it exists throughout the system. For example, the 

of processing requests faster than the main memory while directory records which caches have a copy of the data, and 

also serving to reduce the number of requests that the main further records if any of the caches have an updated copy of 

memory must handle. This increases system throughput. 6 o the data. When a cache makes a request to main memory for 

While the use of cache memories increases system a data item, the central directory is consulted to determine 

throughput, it causes other design challenges. When mul- where the most recent copy of that data item resides. Based 

tiple cache memories are coupled to a single main memory on this information, the most recent copy of the data is 

for the purpose of temporarily storing data signals, some retrieved so that it may be provided to the requesting cache, 

system must be utilized to ensure that all IPs and I/O 65 The central directory is then updated to reflect the new status 

Modules are working from the same (most recent) copy of for that unit of memory. A novel directory-based cache 

the data. For example, if a copy of a data item is stored, and coherency system for use with multiple Instruction Proces- 
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sors coupled to a hierarchical cache structure is described in 
the co-pending application entitled "Directory-Based Cache 
Coherency System Supporting Multiple Instruction Proces- 
sor and Input/Output Caches" referenced above and which is 
incorporated herein by reference in its entirety. 

The use of the afore-mentioned directory-based cache 
coherency system provides an efficient mechanism for shar- 
ing data between multiple processors that are coupled to a 
distributed, hierarchical memory structure. Using such a 
system, the memory structure may be incrementally 
expanded to include any multiple levels of cache memory 
while still maintaining the coherency of the shared data. As 
the number of levels of hierarchy in the memory system is 
increased, however, some efficiency is lost when data 
requested by one cache memory in the system must be 
retrieved from another cache. 

As an example of performance degradation associated 
with memory requests in a hierarchical cache memory 
system, consider a system having a main memory coupled to 
three hierarchical levels of cache memory. In the exemplary 
system, multiple third-level caches are coupled to the main 
memory, multiple second-level caches are coupled to each 
third-level cache, and at least one first-level cache is coupled 
to each second -level cache. This exemplary system includes 
a non-inclusive caching scheme. This means that all data 
stored in a first-level cache is not necessarily stored in the 
inter-connected secon-level cache, and all data stored in a 
second-level cache is not necessarily stored in the inter- 
connected third-level cache. 

Within the above-described system, one or more proces- 
sors are respectively coupled to make memory requests to an 
associated first-level cache. Requests for data items not 
resident in the first-level cache are forwarded on to the 
inter-coupled second-level, and in some cases, the third- 
level caches. If neither of the intercoupled second or third 
level caches stores the requested data, the request is for- 
warded to main memory. 

Within the current exemplary system, assume a processor 
makes a request for data to the intercoupled first -level cache. 
The requested data is not stored in this first-level cache, but 
instead is stored in a different first-level cache within the 
system. If this request involves obtaining access to a read- 
only copy of the data, and the first-level cache that stores the 
data is storing a read-only copy, the request can be com- 
pleted without involving the first-level cache that currently 
stores a copy of the data. That is, the request may be 
processed by one of the inter-connected second or third- 
level caches, or by the main memory, depending on which 
one or more of the memory structures has a copy of the data. 

In addition to read requests, other types of requests may 
be made to obtain "exclusive" copies of data that can be 
updated by the requesting processor. In these situations, any 
previously cached copies of the data must be marked as 
invalid before the request can be granted to the requesting 
cache. That is, in these situations, copies of the data may not 
be shared among multiple caches. This is necessary so that 
there is only one "most-current" copy of the data existing in 
the system and no processor is working from outdated data. 
Returning to the current example, assume the request from 
the first-level cache is for an exclusive copy of data. This 
request must be passed via the cache hierarchy to the main 
memory. The main memory forwards this request back down 
the hierarchical memory structure to the first-level cache that 
stores the requested data. This first -level cache must invali- 
date its stored copy of the data, indicating that this copy may 
no longer be used. If necessary, modified data is passed back 



to the main memory to be stored in the main memory and to 
be forwarded on to the requesting first-level cache. In this 
manner, the requesting cache is provided with an exclusive 
copy of the most current data. 

5 As may be seen from the current example, in a hierar- 
chical memory system having multiple levels of cache that 
are not all interconnected by a common bus structure, 
obtaining an exclusive copy of data that can be utilized by 
a processor for update purposes may be time-consuming. As 

10 the number of these so-called "ownership" requests for 
obtaining an exclusively "owned" data throughput may 
decrease. This is especially true if additional levels of 
hierarchy are included in the memory structure. What is 
needed, therefore, is a system that minimizes the impact on 

15 processing throughput that is associated with making own- 
ership requests within a hierarchical, directory-based 
memory system. 

OBJECTS 

The primary object of the invention is to provide an 
improved shared memory system for a multiprocessor data 
processing system; 

A further object is to provide a hierarchical, directory- 
based shared memory system having improved 
response times; 
A yet further object is to provide a memory system 
allowing multiple ownership requests to be pending to 
main memory from a single processor at once; 
Yet another object is to provide a memory system that 
allows multiple ownership requests to be pending from 
all processors in the system simultaneously; 
A still further object is to provide a memory system that 
allows an instruction processor, to continue processing 
instructions while multiple ownership requests are 
pending to main memory; 
Another object is to provide a memory system that allows 
multiple memory write requests that were issued by the 
same instruction processor to be processed simulta- 
40 neously by the memory while additional write requests 
are queued for processing by the instruction processor; 
A yet farther object is to provide a memory system 
allowing a subsequently-issued memory read request to 
by-pass all pending write requests that were issued by 
45 the same processor, and to thereby allow the read 
request to complete without being delayed by owner- 
ship requests to main memory; and 
Yet another object is to provide a memory system that 
ensures that multiple simultaneously-pending memory 
50 write requests from the same processor are processed in 
the time-order in which the requests were issued so that 
data coherency is maintained. 

SUMMARY OF THE INVENTION 

55 The objectives of the present invention are achieved in a 
memory system that allows a processor to have multiple 
ownership requests pending to memory simultaneously. The 
data processing system of the preferred embodiment 
includes multiple processors, each coupled to a respective 

60 cache memory. These cache memories are further coupled to 
a main memory through one or more additional intermediate 
levels of cache memory. As is known in the art, copies of 
main memory data may reside in one or more of the cache 
memories within the hierarchical memory system. The main 

65 memory includes a directory to record the location and 
status of the most recent copy of each addressable portion of 
memory. 
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A processor makes memory requests to its respectively- The current inveotion further provides read request pro- 
coupled cache memory. In the case of write requests, the cessing logic coupled to the respectively-coupled cache. A 
respectively coupled cache memory must verify that own- read request issued by the processor is received by the read 
ership has already been obtained for the requested addres- request logic, and is processed, in most cases, before pro- 
sable portion of memory. If ownership has not been 5 ce ssing completes for any of the multiple pending write 
obtained, the cache memory must make an ownership requests. An exception to this rule exists for a read request 
request via the intermediate levels of cache memory. This thal requests access to the same addressable portion of 
request will be forwarded to main memory if necessary, me M was reques ted by a previously-issued write 
which in turn, may be required to complete the request by { 1q this ^ the processing of mc read reques t must 
invalidating a copy of the data located in another cache be M d ^ ^ iousi ^a tion is 
memory. Request processing may also require that an , ; , ~- j., , . r... c , r 
updated data c^pyr^ obtained from the other cache memory completed The expedited handling of read requests is 
and forwarded to the requesting cache. performed because, in the system of the preferred 

The current invention allows multiple requests for own- embodiment, an instruction processor can not continue 

ership to be pending from a processors respectively-coupled execution until * P^ding read request to memory has been 

cache memory simultaneously. In the preferred 15 completed. 1° contrast outstanding write requests do not 

embodiment, first request logic associated with the cause the processor to "stall in this manner, and processor 

respectively-coupled cache memory receives a first write execution may continue even if multiple outstanding write 

request from the processor. The first write request will be requests are pending to memory. 

staged to second write request logic if another write request Still other objects and advantages of the present invention 

is not already being processed by the respectively-coupled 20 will become readily apparent to those skilled in the art from 

cache. After the first request is staged, another write request the following detailed description of the preferred embodi- 

may be provided to the first request logic for processing. me nt and the drawings, wherein only the preferred embodi- 

After being staged to the second write request logic, a ment of the invention is shown, simply by way of illustration 

determination is made as to whether ownership is available 0 f the best mode contemplated for carrying out the inven- 

for the addressable memory portion requested by the first 25 t ion. As will be realized, the invention is capable of other 

write request. If ownership is not available, an ownership and different embodiments, and its several details are 

request is made for the requested memory portion via the ble of modificalions in various reS pects, all without 

intermediate cache structure. While this request * being d { from ±& mvention Accordinglv , the drawings and 

issued, a second determination is made regarding the avail- j ■ ♦ u a a . *u < . e v ui i 

.... ' c e 4 , « ». f A , description are to be regarded to the extent of applicable law 

ability of ownership for the second write request. A second ™ •« .*•• j * • 

J . . ; . . , w - . 30 as illustrative in nature and not as restrictive, 
ownership request is generated it ownership is again 

unavailable for the requested memory portion. BRIEF DESCRIPTION OF THE FIGURES 

Eventually, ownership and any updated data associated 

with the first request will be provided to the requesting cache The present invention will be described with reference to 

by main memory, or alternatively, by another cache memory. 35 the accompanying drawings. 

The first write request may then be completed to the request- FIG. 1 is a block diagram of a Symmetrical MultiProces- 

ing cache. After the completion of the first request, owner- sor (SMP) system platform according to a preferred embodi- 

ship for the second request is, in most cases, already ment of the present invention; 

available because of the concurrent request processing for FIG. 2 is a block diagram of a Processing Module (POD) 

the first and second ownership requests. The second write 40 according to one embodiment of the present invention; 

request is staged to the second write request logic and , . « , , rot_n », , , 

1 * j •*? . j 1 4 i 4 . * j . FIG. 3 is a block diagram of a Sub -Processing Module 

completed without delay. Thus, the time required to process /c , , j- . L j- * c *t_ 

r , . . \ . . . ,„ . 4 , (Sub-POD) according to one embodiment of the present 

the second request is, in most instances, buried by the invention* 

processing of the first request, thereby reducing the process- ' 

ing time for the two requests by almost fifty percent. 45 FIG - 4 * a block dia & ram of the Instruction Processor and 

In the system of the preferred embodiment, ownership Second Uvel Cache of the P referred embodiment; and 

grants are not necessarily provided in the order in which PIGS. 5A, 5B, and 5C, when arranged as shown in FIG. 

ownership requests are made. Therefore, in the above 5 > a flowchart illustrating the manner in which two 

example, ownership for the second request may become requests for ownership are processed simultaneously 

available prior to that for the first request. The current 50 according to the memory coherency scheme of the preferred 

invention includes control logic to ensure that requests are embodiment, 
processed in the order issued by the respective instruction 

processor, regardless of the order in which ownership is DETAILED DESCRIPTION OF THE 

granted. This is necessary to ensure newer data is not PREFERRED EMBODIMENTS 

erroneously overwritten by an older request. 55 ™ r 

a a t , . r • System Platform 

According to another aspect of the invention, a write J ~ 

request buffer coupled to the respective cache memory is FIG. 1 is a block diagram of a Symmetrical Multi- 
provided to receive additional pending write requests issued Processor (SMP) System Platform according to a preferred 
by the processor. The processor may continue issuing write embodiment of the present invention. System Platform 100 
requests until the write request buffer is full. The pending 60 includes one or more Memory Storage Units (MSUs) in 
requests are processed in the order they are issued. dashed block 110 individually shown as MSU 110A, MSU 
Therefore, after the cache completes processing of the older HOB, MSU HOC and MSU 110D, and one or more Pro- 
of two simultaneously-pending write requests in the above- cessing Modules (PODs) in dashed block 120 individually 
described manner, a predetermined one of the requests shown as POD 120A, POD 120B, POD 120C, and POD 
stored in the write request buffer is removed from the buffer 65 120D. Each unit in MSU 110 is interfaced to all PODs 120A, 
and provided to the first write request logic to be processed 120B, 120C, and 120D via a dedicated, point-to-point con- 
by the cache. nection referred to as an MSU Interface (MI) in dashed 
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block 130, individually shown as 130A through 130S. For embodiment allows system platform 100 to be configured 

example, MI 130A interfaces POD 120A to MSU 110A, MI based on the number of I/O devices used in a particular 

130B interfaces POD 120A to MSU HOB, MI 130C inter- application. In another embodiment of the present invention, 

faces POD 120A to MSU HOC, MI 130D interfaces POD one or more I/O Modules 140 are incorporated into Sub- 

120A to MSU HOD, and so on. 5 POD 210. I/O Modules 140 are discussed in further detail 

In one embodiment of the present invention, MI 130 below, 

comprises separate bidirectional data and bi-directional p ro cessin Module 

address/command interconnections, and further includes " 1 * 

unidirectional control lines that control the operation on the FIG. 3 is a block diagram of a Sub-Processing Module 
data and address/command interconnections (not individu- 30 (Sub-POD) according to one embodiment of the present 
ally shown). The control lines run at system clock frequency invention. Sub-POD 210A is shown, but it is understood that 
(SYSCLK) while the data bus runs source synchronous at all Sub-PODs 210 have similar structures and interconnec- 
two times the system clock frequency (2xSYSCLK). In a tions. In this embodiment, Sub-POD 210A includes a Third- 
preferred embodiment of the present invention, the system Level Cache (TLC) 310 and one or more Coherency 
clock frequency is 100 megahertz (MHZ). 15 Domains 320 (shown as Coherency domains 320A, 320B, 

Any POD 120 has direct access to data in any MSU 110 320C, and 320D). TLC 310 is connected to Coherency 

via one of Mis 130. For example, MI 130A allows POD Domains 320A and 320B via Bus 330A, and is connected to 

120A direct access to MSU 110A and MI 130F allows POD Coherency Domains 320C and 320D via Bus 330B. TLC 

120B direct access to MSU HOB. PODs 120 and MSUs 110 310 caches data from the MSU, and maintains data coher- 

are discussed in further detail below. 20 ency among all of Coherency Domains 320, guaranteeing 

System Platform 100 further comprises Input/Output that each processor is always operating on the latest copy of 

(I/O) Modules in dashed block 140 individually shown as the data - 

I/O Modules 140A through 140H, which provide the inter- Each Coherency Domain 320 includes an Instruction 

face between various Input/Output devices and one of the Processor (IP) 350 (shown as IPs 350A, 350B, 350C, and 

PODs 120. Each I/O Module 140 is connected to one of the 350D). Each of the IPs includes a respective First-Level 

PODs across a dedicated point-to-point connection called Cache (not shown in FIG. 3.) Each of the IPs is coupled to 

the MIO Interface in dashed block 150 individually shown a Second-Uvel Cache (SLC) 360 (shown as SLC 360A, 

as 150A through 150H. For example, I/O Module 140A is 360B, 360C and 360D) via a respective point-to-point 

connected to POD 120Avia a dedicated point-to-point MIO Interface 370 (shown as Interfaces 370A, 370B, 370C, and 

Interface 150A. The MIO Interfaces 150 are similar to the 370D). Each SLC further interfaces to Front-Side Bus (FSB) 

MI Interfaces 130, but in the preferred embodiment have a Logic 380 (shown as FSB Logic 380A, 380B, 380C, and 

transfer rate that is approximately half the transfer rate of the 380D) via a respective one of Interfaces 385A, 385B, 385C, 

MI Interfaces because the I/O Modules 140 are located at a and 385D. FSB Logic is also coupled to a respective one of 

greater distance from the PODs 120 than are the MSUs 110. 35 Buses 330A or 330B. 

The I/O Modules 140 will be discussed further below. In the preferred embodiment, the SLCs 360 operate at a 

different clock speed than Buses 330A and 330 B. Moreover, 

Processing Module (POD) the request and response pro tocols used by the SLCs 360 are 

FIG. 2 is a block diagram of a processing module (POD) not tne same as those employed by Buses 330A and 330B. 
according to one embodiment of the present invention. POD 4 o Therefore, FSB logic is needed to translate the SLC requests 
120A is shown, but each of the PODs 120A through 120D ^ a format and clock speed that is compatible with that 
have a similar configuration. POD 120A includes two Sub- u sed by Buses 330. 
Processing Modules (Sub-PODs) 210A and 210B. Each of rx . ™ ^ ^ ^ u 
the Sub-PODs 210A and 210B are interconnected to a Directory-Based Data Coherency Scheme of the 
Crossbar Module (TCM) 220 through dedicated point-to- 45 System Architecture 
point Interfaces 230 A and 230B, respectively, that are simi- Before discussing the Instruction Processor and Second- 
er to the MI interconnections 130. TCM 220 further inter- Level Cache in more detail, the data coherency scheme of 
connects to one or more I/O Modules 140 via the respective the current system is discussed. Data coherency involves 
poin -to -point MIO Interfaces 150. TCM 220 both buffers ensuring that each POD 120 operates on the latest copy of 
data and functions as a switch between Interfaces 230A, 50 the data, wherein the term "data" in the context of the current 
230B, 150A, and 150B, and MI Interfaces 130A through Application refers to both processor instructions,, and any 
130D. When an I/O Module 140 or a Sub-POD 210 is other types of information such as operands stored within 
interconnected to one of the MSUs via the TCM 220, the memory. Since multiple copies of the same data may exist 
MSU connection is determined by the address provided by within platform memory, including the copy in the MSU and 
the I/O Module or the Sub-POD, respectively. In general, the 55 additional copies in various local cache memories (local 
TCM maps one-fourth of the memory address space to each copies), some scheme is needed to control which data copy 
of the MSUs 110A-H0D. According to one embodiment of is considered the "latest" copy. 

the current system platform, the TCM 220 can further be The platform of the current invention uses a directory 

configured to perform address interleaving functions to the protocol to maintain data coherency. In a directory protocol, 

various MSUs. The TCM may also be utilized to perform $0 information associated with the status of units of data is 

address translation functions that are necessary for ensuring stored in memory. This information is monitored and 

that each processor (not shown in FIG. 2) within each of the updated by a controller when a unit of data is requested by 

Sub-PODs 210 and each I/O Module 140 views memory as one of the PODs 120. In the preferred embodiment of the 

existing within a contiguous address space as is required by present invention, directory information is recorded in a 

certain off-the-shelf operating systems. 65 directory memory in each of the MSUs. These are shown as 

In one embodiment of the present invention, I/O Modules Directory Memories 160A, 160B, 160C, and 160D of FIG. 

140 are external to Sub-POD 210 as shown in FIG. 2. This 1. Directory information is recorded in each of the Directory 
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Memories for each 64-byte segment of data in the respective 
MSU 110, wherein such a segment is referred to as a cache 
line. The status of each cache line is updated when access to 
the cache line is granted to one of the Sub-PODs 210. The 
status information includes a vector which indicates which 
of the Sub-PODs have a local copy of the cache line. 

In the present invention, the status of the cache line 
includes "shared" and "exclusive." Shared status means that 
one or more Sub-PODs have a local copy of the cache line 
for read-only purposes. A Sub-POD having shared access to 
a cache line may not update the cache line. Thus, for 
example, Sub-PODs 210A and 210B may have shared 
access to a cache line such that a copy of the cache line exists 
in the Third-Level Caches 310 of both Sub-PODs for 
read-only purposes. 

In contrast to shared status, exclusive status, which is also 
referred to as exclusive ownership, indicates that only one 
Sub-POD "owns" the cache line. A Sub-POD must gain 
exclusive ownership of a cache line before data within the 
cache line may be modified. When a Sub-POD has exclusive 
ownership of a cache line, no other Sub-POD may have a 
copy of that cache line in any of its associated caches. 

Before a Sub-POD can gain exclusive ownership of a 
cache line, any other Sub-PODs having copies of that cache 
line must complete any in -progress operations to that cache 
line. Then, if one or more Sub-POD(s) have shared access to 
the cache line, the Sub-POD(s) must designate their local 
copies of the cache line as invalid. This is known as a Purge 
operation. If, on the other hand, a single Sub-POD has 
exclusive ownership of the requested cache line, and the 
local copy has been modified, the local copy must be 
returned to the MSU before the new Sub-POD can gain 
exclusive ownership of the cache line. This is known as a 
"Return" operation, since the previous exclusive owner 
returns the cache line to the MSU so it can be provided to 
the requesting Sub-POD, which becomes the new exclusive 
owner. In addition, the updated cache line is written to the 
MSU sometime after the Return operation has been 
performed, and the directory state information is updated to 
reflect the new status of the cache line data. In the case of 
either a Purge or Return operation, the Sub-POD(s) having 
previous access rights to the data may no longer use the old 
local copy of the cache line, which is invalid. These Sub- 
POD(s) may only access the cache line after regaining 
access rights in the manner discussed above. 

In addition to Return operations, Sub-PODs also provide 
data to be written back to an MSU during Flush operations. 
When a Sub-POD receives a cache line from an MSU, and 
the cache line is to be copied to a cache that is already fill, 
space must be allocated in the cache for the v new data. 
Therefore, a predetermined algorithm is used to determine 
which older cache line(s) will be disposed of, or "aged out 
of cache to provide the amount of space needed for the new 
information. If the older data has never been modified, it 
may be merely overwritten with the new data. However, if 
the older data has been modified, the cache line including 
this older data must be written back to the MSU 110 during 
a Flush Operation so that this latest copy of the data is 
preserved. 

Data is also written to an MSU 110 during I/O Overwrite 
operations. An I/O Overwrite occurs when one of the I/O 
Modules 140 issues an I/O Overwrite command to the MSU. 
This causes data provided by the I/O Module to overwrite 
the addressed data in the MSU. The Overwrite operation is 
performed regardless of which other Sub-PODs have local 
copies of the data when the Overwrite operation is per- 
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formed. The directory state information is updated to indi- 
cate that the affected cache line{s) is "Present" in the MSU, 
meaning the MSU has ownership of the cache line and no 
valid copies of the cache line exist anywhere else in the 
5 system. 

In addition to having ownership following an I/O Over- 
write operation, the MSU is also said to have ownership of 
a cache line when the MSU has the most current copy of the 
data and no other agents have a valid local copy of the data. 
10 This could occur, for example, after a Sub-POD having 
exclusive data ownership performs a Flush operation of one 
or more cache lines so that the MSU thereafter has the only 
valid copy of the data. 

15 Coherency Scheme within a Sub-POD 

As discussed above, in the system of the preferred 
embodiment, directory information is recorded in a directory 
memory in the MSU that indicates which of the Sub-POD(s) 
has a particular cache line. The MSU directory does not, 
however, indicate which of the cache memories within a 
Sub-POD has a copy of the cache line. For example, within 
a Sub-POD, a given cache line may reside within the TLC 
310, one or more SLCs 360, and/or one or more First-Level 
Caches of a Sub-POD IP. Information pertaining to the 

25 specific cached data copies is stored in a directory memory 
within the TLC. 

In a manner similar to that described above with respect 
to the MSU, the TLC stores status information about each 

30 cache line in TLC Directory 315 of FIG. 3. This status 
information indicates whether the TLC was granted either 
exclusive ownership or a read copy of a particular cache fine 
by the MSU 110. The status information also indicates 
whether the TLC has, in turn, granted access to one or more 

35 SLCs in the respective Sub-POD. If the TLC has exclusive 
ownership, the TLC may grant exclusive ownership to one 
of the SLCs 360 in a Sub-POD 120 so that the IP 350 
coupled to the SLC may update the cache line. Alternatively, 
a TLC having exclusive ownership of a cache line may also 

40 grant a read copy of the cache line to multiple ones of the 
SLCs in a Sub-POD. If the TLC only has a read copy of a 
cache line, the TLC may grant a read copy to one or more 
of the SLCs 360 in a POD 120 such that the interconnected 
IP may read, but not write, the cache line. In this case, the 

45 TLC may not grant any of the SLCs write access to the cache 
line. 

The TLC tracks the copies that exist within a POD by 
recording an indicator identifying one or both of the Buses 
330 to which it is coupled. For example, if TLC 310 granted 

50 exclusive ownership of a cache line to SLC 360A, the 
indicator stored in the TLC directory for that cache line 
identifies Bus 330A as having exclusive ownership. If TLC 
310 granted read copies to both SLCs 360A and 360C, the 
TLC directory identifies both Buses 330A and 330B as 

55 having read copies. The manner in which this information is 
used will be discussed below. 

When data is provided to an SLC 360, it may also be 
provided to the respective First-Level Cache (FLC) within 
the IP 350 coupled to that SLC. Generally, whenever an IP 

60 requests a read copy of data, the read copy will be provided 
by the SLC to be stored within the IP's FLC. An exception 
to this rule occurs for certain system-level clock information 
that will become outdated, and therefore is not forwarded to 
the FLC. In contrast to read data, a cache line that is obtained 

65 by the SLC from the TLC on an exclusive ownership basis 
is not generally forwarded to the FLC for storage. An 
exception to this rule occurs for certain resources that are 
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associated with software locks, and which must be cached data. Both Buses may be granted read access to the cache 

within the FLC until the IP releases the lock. The SLC line simultaneously. 

includes Tag RAM Logic (not shown in FIG. 3) to record i D y e t another scenario, the TLC may not have a copy of 

whether the associated FLC stores a copy of a particular me requested cache line at all, or may not have the type of 

cache line. This will be discussed further below. 5 access that is requested. This could occur for a number of 

As discussed above, the directory status information reasons. For example, A TLC may obtain a copy of a cache 

stored within the MSU 110 is used to maintain data coher- line from the MSU, provide it to one or more of the SLCs 

ency throughout the entire system. In a similar manner, the in its Sub-POD, then later age the cache line out of memory 

directory status information within the TLC is used to to make room for another cache line. This aging out of the 

maintain data coherency within the respective Sub-POD 10 cache line in the TLC may occur even though an SLC in the 

210. Withio the Sub-POD, data coherency is maintained for Sub-POD still retains a copy. This is allowed because the 

each of the Buses 330, and is also maintained for the cache memories of the preferred embodiment are not inclu- 

Sub-POD as a whole. sive caches. That is, each cache line residing within an SLC 

Data coherency is maintained for each of the Buses 330 does not necessarily reside in the associated TLC 310. As a 

using a snooping mechanism. If an IP 350 makes a request 15 result of this non-inclusive cache configuration, a request by 

for an address that is not present in either the respective FLC any of the SLCs in the Sub-POD for the cache line may 

or SLC, the SLC initiates a request via the respective FSB result in a cache miss at the TLC even if the cache line is 

Logic 380 to the associated Bus 330. The request will stored in another SLC. A cache miss could also occur 

indicate the type of request (read or write), and will also because the requested cache line does not reside in the TLC 

indicate the request address. Each SLC monitors, or 20 or in any other one of the caches in the respective Sub-POD. 

"snoops" the Bus 330 via its respective FSB logic for these In yet another instance, an SLC may be requesting exclusive 

types of requests from the other SLC. When such a request ownership of a cache line, but the TLC has only been 

is detected, the SLC that detected the request checks its granted a read copy of a requested cache line. In any of these 

internal Tag RAM to determine whether it stores a modified cases > the ^ c must make a request via the TCM 220 t0 lhe 

copy of the requested data. If it does store a modified copy 25 respective MSU Interface (MI) 130 for the cache line, 

of the requested data, that data is provided on Bus 330 so After a TLC makes a request via the respective MI 

that a copy can be made within the requesting SLC. Interface for access to a cache line, the request is presented 

Additionally, if the requesting SLC is requesting exclusive to MSU 110, and the directory logic within the MSU 

ownership of the data, the other (non-requesting) SLC must determines where the most current copy of the data resides, 

also mark its resident copy as invalid, since only one SLC 30 This is accomplished in the manner discussed above. If the 

may have write ownership at a given time. Furthermore, if MSU owns the most recent copy of the data, the data may 

the SLC detecting the request determines that its associated be provided immediately to the requesting TLC with the 

FLC also stores a copy of the cache line that is requested for requested permission as either a read copy or with exclusive 

exclusive ownership, that SLC must direct the FLC to ownership. Similarly, if only a read copy of the data is being 

invalidate its local copy. 35 requested, and the MSU has granted only read copies to 

If an SLC is requesting a cache line that has not been other Sub-PODs 210, the MSU may immediately provide 

modified by the other SLC that resides on the same Bus 330, the additional read copy to the requesting TLC. However, if 

the TLC 310 will handle the request. In this case, the SLC exclusive ownership is being requesting, and the MSU has 

presents the request to Bus 330, and because the associated alre ady granted exclusive ownership to another TLC 310 in 

SLC does not respond to the request in a pre-determined another Sub-POD, the MSU must initiate a Return operation 

period of time with snoop results, the TLC handles the so that the TLC currently owning the data returns any 

request. updated data. Additionally, if exclusive ownership is being 

TVi tt r _ CT rts ■ 0f , c • requested, the MSU must initiate a request to any other 

The TLCs process requests from the SLCs in the associ- _ i „^ ' . e . , \. / . t 

ated Sub-POD by determining if that Sub-POD has been 45 f^D haymg a copy of the cache bne duecbng that 

granted the type of access that is being requested, and if so, Sub-POD to mvahdate its copy. These MSU requests may 

r . a . v i ■ j i7 . take a substantial amount of time, especially if a large 

how the requested cache line may be obtained. For example, ' v ' * 

a TLC may not grant a request made by an SLC for exclusive number t ° f ? * T \ * 

ownership of a cache line if the TLC itself has not been w f Sub-PODs having current copies of the 

granted exclusive ownership. If the TLC has been granted 50 reQ . uested cacbe lme - 

exclusive ownership, the TLC must further determine if the From the above discussion, it is apparent that if a large 

other (non-requesting) Bus 330 has, in turn, been granted number of requests are being processed across the MI 

exclusive ownership. If the other Bus 330 has exclusive Interfaces, the necessity to request exclusive ownership 

ownership of the data, the TLC issues a request to that Bus f rom the MSU may substantially increase the time required 

to initiate return of the data. Because the SLCs are snooping 55 10 perform a write operation. The current invention mini- 

the Bus, this request will be detected, and an SLC owning rnizes the time required to obtain exclusive ownership by 

the data will return any modified copy of the data to the prefetching ownership before a write request is actually 

TLC. Additionally, any copies of the requested cache line Dcin S processed. 

will be marked as invalid. The TLC may then provide the _ . . . , _ , . _ e , . 

data to the requesting SLC and update the directory infer- 60 Descn P tl0n of r t! ? e ° wnersh |P PrefelchlD 8 System 

mation to indicate that the other Bus 330 now has the of the Current Invention 

exclusive ownership. FIG. 4 is a block diagram of the Instruction Processor 

Asimilar mechanism is used if the SLC is requesting read 350A and Second Level Cache 360A of the preferred 

access. If the TLC has been granted read access by the MSU embodiment. Although the logic within Coherency Domain 

for the requested cache line, the data is provided to the 65 320A is shown and described, it will be understood that the 

requesting SLC and the directory information is updated to following description applied to all other coherency domains 

reflect that the associated Bus 330 has read access of the included in Data Processing System 100. Within the Instruc- 
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tion Processor, Processing Logic 402 executes instructions access rights to the cache line from MSU 110 as determined 

and processes operands retrieved from one of the cache by the TLC state bits, and the cache line is either not resident 

memories included in Coherency Domain 320A, or from in any of the other SLCs in the Sub-POD 210A, has only 

MSU 110. Processing Logic will attempt to first retrieve an been provided to the TLC as a read copy, or is exclusively 

instruction or operand from FLC 404 by making a request on 5 o wned °y lhe TLC and D0 other SLC 10 lhe Sub-POD has 

Line 406. If the address is not located in FLC 404, a cache been granted exclusive ownership, the data may be provided 

miss indication is provided to Processing Logic 402 on Line J? SLC 360A on Bus 330A upon receipt of the request. 

408. As a result, Processing Logic will make a read request ^^1,110^ «r 

t ci s> -*<:t\A t - am tl -o ; n i>-*a the cache hne by MSU 110, and one of the SLCs 360C or 

to SLC 360A on Line 410 Tlic request is captured in Read 36QD faas an ^ ^ mu$l a t 

Request Register 412, and is presented to the cache Tag io on Bus 330B so mat the SLC owning the data will return any 

RAM Logic 414 and to the Data RAM Logic 420 in parallel daU updates and the exclusive owne rship to the TLC 310. As 

on Lines 413A and 413B, respectively. noted above> lhis is referred to ^ a "Return" operation. The 

In a manner known in the art, Tag RAM Logic 414 SLC having the data will detect this request using associated 

determines whether the requested address is resident within Bus Snooping Logic such as that shown as Bus Snooping 

the SLC 360A. If it is, a hit signal is provided to Data RAM 15 Logic 432 for SLC 360A. In response, the SLC will return 

Logic on Interface 418 so that the requested cache line data the data on Bus 330B to TLC, which will in turn forward that 

that has already been read from Data Storage Devices 419 of data to the requesting SLC 360A. TLC 310 will update the 

Data RAM Logic 420 is gated onto Line 424 to be returned cache line data to reflect any modifications made by the 

to Processing Logic 402. If the request address is not previous owner, and will also update its state bits to record 

resident within the SLC 360A, a cache miss indication is 20 the new status and location of the data copy as being a read 

provided to Control Logic 426 on Control Interface 428. In C0 Py that was made available to Bus 330A. 

response, Control Logic receives the read request signals If the TLC does not have a copy of the requested cache 

from Read Request Register 412 on Line 427, and forwards ^ne the TLC : makes z i request across MI Interface 130 to the 

the read request on Line 429 to Interface 385A. In turn, FSB MSU 110. If the MSU owns that data the data may be 

Logic 380A receives the request from Interface 385A, and 25 retailed to the TLC 310 upon receipt of the request by the 

reformats the request into the request format used by Bus MSU - L±C ^t°r £cl ^^r^S^t^ 

' one or more other TLCs, the MSU may provide the 

33,,A requested cache line to TLC 310. However, if one of the 

After the request is provided to Bus 330A, SLC 360B omer TLCs has been granted exclusive ownership of the 

detects the request using logic similar to Bus Snooping requested cache line, MSU 110 must send a request to the 

Logic 432 shown for SLC 360A. The Bus Snooping Logic ot her TLC directing that TLC to invalidate its copy and 

for SLC 360B receives the request signals from Interface re turn ownership to the MSU. In response, the TLC will use 

385B on an interface similar to that shown as Line 431 for [ is slate bits to determine if any of the SLCs in its associated 

SLC 360A. The SLC 360B Bus Snooping Logic reads state Sub-POD 210 has been granted exclusive ownership of the 

bits stored in its Tag RAM Logic to determine whether a ^ data. The TLC will direct the SLC to return any modified 

cache line is resident within the SLC, and whether the cache CO py of the data to the TLC, and to mark any copy of the 

line is available as a shared read-only copy or as an cache line resident in either a FLC or SLC as invalid so that 

exclusively-owned copy. The state bits further record n can no longer be used. The TLC will likewise mark its 

whether the copy has been modified, and whether the copy CO py ^ invalid, and any cache line updates will be for- 

is still valid or whether it has been marked as invalid such warded on the associated MI Interface 130 to MSU 110. This 

that it may no longer be used. data may then finally be provided as a read copy to the 

In the current example, if the state bits in SLC 360B for requesting TLC 310. 

the requested cache line indicate that the cache line is When the TLC receives the requested data, the TLC will 

exclusively owned by SLC 360B, and has also been modi- make an entry in its directory memory for the cache line, 

fied by SLC 360B, SLC 360B provides the updated copy on 45 then provide the data to Coherency Domain 320A via Bus 

Line 430 to Interface 385B. SLC 360B will also invalidate 33OA. The data is forwarded to SLC 360 via FSB Logic 

its copy, and cause the associated FLC to invalidate its copy, 380A, Interface 385A, and Line 430. The data is written to 

if necessary. FSB Logic 380B receives and translates the Data RAM Logic 420, and is also provided on Line 424 to 

data from the format used by the SLC to the format required Processing Logic 402 of IP 350 A. 

by Bus 330A. 50 ] t may De note d that the return of the requested data to the 

After FSB Logic 380B provides the re-formatted data to Processing Logic 402 may require a delay, since the data 

Bus 330A, FSB Logic 380A receives this data and translates may have to be returned from another Sub-POD 210. During 

it back to the data format used by the SLC. The data is this time, the Processing Logic is stalled waiting for the read 

provided on Line 430 of SLC 360A so that a read-only copy data. This "stalling" of the IP during read requests involving 

of the data may be stored in Data RAM Logic 420. In 55 data not available within a cache memory will be discussed 

addition, control signals are provided on Line 431 to Bus in more detail below. 

Snooping Logic 432 of SLC 360A so that Bus Snooping To prevent IP "stalls" from occurring during write 

Logic may update the Tag RAM Logic 414 to record read requests, an ownership prefetch mechanism is implemented 

ownership. which minimizes the delay in obtaining ownership of a 

TLC 310 also snoops Bus 330 A, and detects that SLC 60 cache line that is not present within an IP's SLC. When the 

360B has provided the updated data to SLC 360A. Processing Logic 402 is writing a modified operand to 

Therefore, TLC does not respond to the request. The TLC memory, the requested write address is presented to the FLC 

updates its stored cache fine data copy to reflect the modi- 404. If a cache hit occurs, the write operation occurs to the 

fications made by SLC 360B, and also records that Bus FLC. Regardless of whether a cache hit occurs to the FLC 

330A now has a copy of the data for read purposes only. $5 404, the updated data will also be written to SLC. 

If SLC 360B did not have an updated copy of the data, Before the modified data is presented to the SLC, it is 

TLC 310 handles the request. Assuming the TLC has gained temporarily stored in Write Buffer Logic 434. Write Buffer 
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Logic is capable of storing up to eight write requests at once. Interface 385A so that FSB Logic 380A receives and refor- 

The data stored within the Write Buffer Logic need not be mats the request into the format used by Bus 330A. 

written to the SLC immediately. That is, generally the After the request is presented to Bus 330A, it is processed 

Processing Logic 402 may continue executing instructions m the manner that ^ similar to that discussed above with 

even though the write data has not been written to the SLC. 5 re t t0 read requesls Namely, if TLC 310 has been 

Processing Logic 402 is only required to wait for the granle d exclusive ownership of the data by MSU 110, and no 

completion of a write operation within the SLC in those SLC ^ a of ^ d ^ ide ^ exclusive 

situations in which a read operation is requesting access to owncrshi l0 s L C 360A. If SLC 360B has been granted 

the same addressable memory location as a pendmg write ownershi of the ucsted cache Mt by tlc 

request. To detect this situation the read request on Line 410 , fl 31Q ^ tf SLC 3 6QB has modified ^ ^ lme> the data 

is provided to Write Buffer Logic 434 Write Request 2 ^ fae SLC 36QB QQ fius mA {Q SLQ mA 

Ugic 438, and Write Request 1 I^>gic 454 to be compared ^ occurs ^ Bus s m ^ k $hQ 

agamst all pending write ^dd^. llie conflict » indicated 36Qfi delecls me t and fwther nizes lhat a 

using signals on Lines 462 and468 respectively. If a conflict mQdified ^ stQred fa Datfl ^ [q fof SLC 36QB 

is detected, the Processing Logic 402 must wait for the write 15 ^ ^ SLC 36QB ^ be marke(j ^ invali(J 0therwise> 

operation to the SLC to complete so that the IP is guaranteed tf SLC 36QB has an unmodified copy of the requested data> 

to receive updated data. ^ 31Q ovides the copy of the data> 

As stated above, the requests stored in the Write Buffer CT „ 

T . , . i.j- j-.i »ud Alternatively, one of the SLCs 360C or 360D on Bus 

Logic need not be completed immediately since the Pro- . J ' , . . . f . t , . . 

• T • , o *1 # . „ „ r^; t , ^ . t . trt 330B may have exclusive ownership of the requested data, 

cessing Logic 402 does not stall waiting tor the write data to 2 o , .l- .l j . * .u - 

i_ ?• T 1 . *i. or ^ ajj'.- 11 j In this case, the data must be returned by the previous owner 

be written back to the SLC. Additionally, read operations ' ..... . r, . f CTO ^ nA 

f , c . .„ ,. j, n, f n ff to TLC 310 so that it can be provided to SLC 360A. 

performed after a write request is added to Write Buffer AJJ . . ... ^ 0 - in .„ . , # , 

t ^-x^ « u #u ^ ^ .™ Additionally, the state bits within TLC 310 will be updated 

Logic 434 may by-pass the wnte operation using Read n v „ ~™ . , 4 „ „ An r . 

n * w ■ * Jm5 j * 1^,1 ^ to reflect that Bus 330A, and not Bus 330B, now has 

Request Register 412, and may be completed before the , . , . ' . _. ' . . 

•* o ' , j ( „ ,L fl orp tl 0 avn *A\<*A exclusive ownership of the cache line. Finally, if only read 

write operation is presented to the SLC. The expedited 95 - , . .... w^^a. r,t_ 

. c , . • r • • '*u id copies have been provided by TLC 310 to one or more of the 

processing of read requesls is performed to minimize the IP / * ' . « ,i_ 

t 11 iu » ,l e , p i/aa .v .„ n ;i; nn r^, »k„ SLCs 360, the TLC must issue requests to the one or more 

stalls that occur while the SLC 360 A is waiting for the return _ , _ 7 l ,l 1 

c , e , r of the Buses 330A and/or 330B having the read copies so 

of exclusive ownership of a cache line. . 4 . OT _ .„ . . ' . . & . . . , j 

. j - „ T . _ „ T . ... that the SLCs 360 having these copies mark them as invalid. 

When a request * removed from Write Buffer Logic 434, w OWQershi then be ovided t0 SLC 

it is written on Lme 436 to a storage device included within 30 ^60 A 

Write Request 2 Logic 438 shown as Storage Device 2 ' . 

438A. A designated signal included in the write request ^ above scenarios assume that TLC 310 has already 

indicates that a valid request is now present in Write Request g aioed exclusive ownership from MSU 110 for the requested 

2 Logic. Control Logic 426 receives this valid write request cache line - If lhis is not ^ case > lhe TLC makes a request 

on Interface 440. Control Logic also receives signals on Line 35 across MI Interface 130 t0 lhe MSU 110 - If the MSU owns 

442 from a request staging register included within Write lnat data > the data ma y be returned upon receipt of the 

Request 1 Logic 454. This staging register is shown as rec l uest t0 the TLC 310 - If the MSU does 00t own the data 

Storage Device 1 454A. If Control Logic determines that a and ^ermines that other read copies have been provided to 

valid request is present in Storage Device 2 438A, but is not °™ or more other TLCs, the MSU must send a request to the 

present within Storage Device 1 454A, Control Logic pro- 40 other one or more 71X8 Meeting that any copies in the 

vides an enable signal on Une 444 to Write Request 2 Logic associated one or more Sub-PODs 210 be invalidated. This 

and Write Request 1 Logic. This enable signal allows the invalidation process occurs in the manner discussed above, 

request to be staged on Line 456 from Storage Device 2 the invalidation requests are issued by the MSU, the 

438A to Storage Device 1 454A. At the same time, Write MSU ma y thereafter provide the requested cache line to the 

Request 2 Logic provides an acknowledge signal on the 45 "jesting TLC 310 and update its directory memory to 

interface shown as Line 436 to Write Buffer Logic 434 to reflect the new cache lme status - 

indicate that a second write request may be removed from According to another scenario, the MSU may have 
Write Buffer Logic 434 and sent to Write Request 2 Logic granted exclusive ownership of the requested cache line to 
438. another one of the TLCs in one of the other Sub-PODs in the 
The write request stored in Write Request 1 Logic 454 is 50 system. After using the state bits in the directory memory to 
presented to the Tag RAM Logic 414 and Data RAM Logic determine which one of the TLCs owns the cache line, the 
420. The Tag RAM Logic determines whether the requested MSU sends a request to that TLC directing it to invalidate 
address is resident in Data RAM Logic 420. If it is, Tag the local copy and return ownership. In response, the TLC 
RAM Logic 414 provides a signal on Line 418 indicating the will use its state bits to determine if any of the SLCs in its 
initiated write operation may be completed to the Data RAM 55 associated Sub-POD 210 has been granted exclusive own- 
Logic. Tag RAM Logic also updates its state bits to indicate ership or a read copy of the data. The TLC will request that 
that the SLC stores a modified copy of the cache line. As any copy in the SLC and/or associated FLC be invalidated, 
discussed above, the state bits arc used to determine when Any updates to the cache line that are resident within an SLC 
data needs to be provided on Line 430 to Interface 385A in must be returned to the TLC in the manner discussed above 
response to Bus Snooping Logic 432 detecting a request for 60 to be forwarded to the MSU. The MSU will grant ownership, 
an updated cache line. If Tag RAM Logic indicates that the a nd, if necessary, provide an updated copy of the cache line 
requested cache line is either not resident within Data RAM to TLC 310 in Sub-POD 210A. The MSU will also update 
Logic 420, or is only available as a read copy, Tag RAM its data copy and modify the state bits in its directory 
Logic indicates this status to Control Logic 426 on Interface memory to reflect the exclusive ownership that has been 
428. In a manner similar to that discussed above in reference 65 granted to TLC 310 in Sub-POD 210A. 
to read request operation processing, Control Logic provides Once ownership for a cache line has been granted to TLC 
a request for ownership of the cache line on Line 429 to 310, the state bits for the cache line are updated within the 
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TLC, and the TLC forwards the data on Bus 330A to the data will be provided on the interface shown as Line 470 to 

requesting SLC, which in this example is SLC 360A. This Data RAM Logic for processing. An acknowledge signal 

data is received by FSB Logic 380A, where it is translated issued on Line 436 to Write Buffer Logic 434 will signal that 

into a format required by the SLC 360A. Then it is provided a new request may be staged to Write Request 2 Logic 438 

on Interface 385 A and Line 430 to be written to Data RAM 5 in the manner discussed above. 

Logic 420. Additionally, control signals on Bus 330A are If a latter request stored in Write Request 2 Logic 438 

received by FSB Logic 380A, are translated into the SLC- does not result in a cache hit, or if the requested cache line 

required format, and are passed on Line 431 to Bus Snoop- is not exclusively owned by SCL 360A, Tag RAM Logic 

ing Logic 432. In response to these control signals indicating 414 indicates the cache miss on Interface 428 to Control 

that ownership has been granted to SLC 360A, Bus Snoop- 10 Logic 426. Control Logic receives the request from Write 

ing Logic 432 provides a signal on Line 464 to Control Request 2 Logic 438 on Line 440. This request is then 

Logic 426 indicating the received ownership. In response, forwarded on Line 429 to Interface 385A, is translated into 

Control Logic issues signals on Line 428 to update the status anothcr format b Y FSB L°g ic 380A * and « then provided to 

information stored in Tag RAM Logic to record the lhe Bus 330 V° be processed in the manner discussed 

ownership, and to further record that the cache line is 35 ar^ve. It may be noted mat at this Ume, ownership for the 

modified. Control Logic also issues a signal on Line 466 P™^"™ request may not yet have been returned to 

indicating that Write Request 1 Logic 454 may now provide JLC 360A. TTius, two requests for ownership are pending at 

the modified request data to Data RAM Logic 420 on the ' 

interface shown as Line 470. When the write operation is WheD mul ''.P le ret " ue , sts for ™ P * V 

. . ...... , i , • , i u tu once, ownership may not necessarily be granted in the order 

completed as indicated by an acknowledge provided by the 2Q ^ J k ^ ^ 0 ^ Qer f hi for the t 

Data RAM Logic on Line 470, Storage Device 1 is cleared Wfite R 2 ^ 438 may £ returQed priof 

and becomes available to receive another request. {Q ^ for the pre viously-received request stored in Write 

The above description illustrates the possibly lengthy Request 1 Logic 454. This is because the time required to 

process associated with gaining exclusive ownership in a process the request depends on the numbers of levels within 

system employing a directory-based main memory and a 2 s the hierarchical memory that must be accessed to process the 

hierarchical cache structure. To minimize the impact of the request. This may vary significantly as discussed above, 

delay associated with gaining this exclusive ownership, the When ownership is returned for the latter request first, the 

current invention provides a mechanism that allows multiple latter request must not be processed until the ownership 

requests for ownership to be pending from the same IP at associated with the former request has been returned and the 

once. 30 request is completed. This is necessary to maintain data 

Returning to FIG. 4 and the current example, it will be consistency, as is discussed above. Therefore, regardless of 

recalled that after a first request is staged from Write Request the order in which ownership is obtained, Control Logic 426 

2 Logic 438 to Write Request 1 Logic 454, an acknowledge allows the former request to complete in the manner dis- 

signal is issued on Line 436 to Write Buffer Logic 434. If cussed above. Thereafter, Control Logic causes the latter 

one or more of the write requests is pending in Write Buffer 35 request to be staged into Write Request 1 Logic 454 to be 

Logic, a predetermined one of the pending requests is completed while another pending request is transferred to 

retrieved and provided on Line 436 to Write Request 2 Logic Write Request 2 Logic 438. The completion of this request 

438 where it will be stored. In the preferred embodiment, the may entail waiting while ownership is returned. However, in 

oldest pending request is selected as the predetermined one many cases, the ownership will already be available, and the 

of the requests. It will be assumed for the current example 40 write operation to Data RAM Logic 420 may be completed 

that a valid request is still resident in Write Request 1 Logic immediately without delay. According to the preferred 

454 at the time the latter request is stored in Write Request embodiment, in some instances in which ownership is 

2 Logic. Control Logic 426 detects the valid request signals obtained for a latter-issued write request before ownership is 

provided with each of the requests in Write Request 2 Logic obtained for an earlier-issued write request, the ownership 

and Write Request I Logic, and determines that the more 45 for that latter request is relinquished before the associated 

recently-provided request may not be staged to Write write operation can be completed. This is done to expedite 

Request 1 Logic. Instead, the latter request is maintained in read request processing in certain situations in which two 

Write Request 2 Logic, and is provided on the interface SLCs are requesting access to the same cache line. For 

shown as Line 460 to Tag RAM Logic 414. Tag RAM Logic example, Bus Snooping Logic 432 of SLC 360A may detect 

determines whether the SLC already has exclusive owner- 50 that a read request has been issued on Bus 330Afor the same 

ship of the requested cache line. cache line that was requested by the latter-issued write 

If Tag RAM Logic determines that a write request stored request. Such a read request could either be issued by the 
in Write Request 2 Logic 438 requests access to a cache line SLC 360B, or by the TLC 310, wherein the TLC is respond- 
that is exclusively owned by SLC 360A, the request is ready ing to a request initiated on Bus 330B, or a request from 
to be processed by Data RAM Logic 420 since ownership 55 MSU 110. Since this latter-issued write request that is 
need not be obtained. However, to maintain data pending within SLC 3 60A can not be completed until the 
consistency, write operations must be performed in the order earlier-issued write request is processed, and since it is 
in which they are issued. This prevents a previously- issued undesirable to delay the SLC that issued the read request 
request from overwriting data provided by a more recently- until both of the write requests are completed to SLC 360A, 
issued request. Therefore, if a request stored in Write 60 the ownership associated with the latter write request is 
Request 1 Logic 454 is still pending when the cache hit relinquished. After the earlier issued write request has 
occurs for the latter request, the latter request must wait until completed, the latter request will be staged to Write Request 
the exclusive ownership is provided for the previous request, 2 Logic 438 in SLC 360A in the manner described above, 
and the previous request has been completed. When the Then a second request will be made to Bus 330A to again 
request stored in Write Request 1 Logic 454 is completed in 65 obtain ownership of the requested cache line, 
the manner discussed above, Control Logic 426 will stage FIGS. 5A, 5B, and 5C, when arranged as shown in FIG. 
the latter request to Write Request 1 Logic 454, and the write 5, are a flowchart illustrating the manner in which two 
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requests for ownership are processed simultaneously processing time may be at least partially "buried" for one of 

according to the memory coherency scheme of the preferred the two write requests. This can significantly increase 

embodiment. Block 500 illustrates a first request being throughput. Additionally, the second-level cache design 

transferred to Write Request 2 Logic 438. The first request allows read operations to by-pass the write operations so the 

is staged to Write Request 1 Logic 454 if a valid request is 5 processing time associated with gaining exclusive owner- 

not already stored in Write Request 1 Logic, as shown in ship does not impact the read requests. This is the case for 

Decision Block 502 and Block 504. Then a second request all read requests except those to cache lines that are asso- 

is transferred to Write Request 2 Logic 438, as shown in ciated with write requests. Read operations to cache lines 

Block 506. If a request was already stored in Write Request associated with pending write requests may not be per- 

1 Logic 454 when the first request in the flow diagram was Q formed until the data updates associated with the pending 

received, the first request remains stored in Write Request 2 write requests have been recorded in the SLC, as discussed 

Logic 438, as shown by Arrow 508. above. Finally, the use of the write buffer allows up to eight 

Next, both of the pending write requests stored in Write requests to be queued before a write request issued by the IP 

Request 1 and Write Request 2 Logic are processed accord- causes the IP to stall. That is, the IP is not required to 

ing to the following steps, as indicated by Block 510. If the 15 discontinue processing instructions until the IP makes a 

SLC has ownership of the requested cache line, processing write request at a time when eight write requests are already 

continues to FIG. 5C as indicated by Decision Block 512 pending in the write buffer, and an additional two requests 

and Arrow 514. If this is the older of the two pending are pending in the SLC 3 60 A. 

requests, the write request is completed, as indicated by It may be noted that a mechanism similar to that provided 
Decision Block 516, Arrow 518, and Block 520. Otherwise, 20 by the current invention for write requests could likewise be 
this request must remain pending until the oldest request is implemented for read requests. That is, a system for pro- 
completed so that data coherency is maintained. After the viding multiple read requests for cache lines not present in 
oldest pending request is completed, this request may be the SLC could be implemented in a manner similar to that 
staged to the Write Request 1 Logic, and the write operation shown in FIG. 4 for write requests. However, a design 
may be performed, as indicated by Arrow 521, Blocks 522 25 choice was made to exclude this logic for read requests in 
and 524, respectively. the preferred embodiment of the current system for several 

Returning to FIG. 5 A, if the SLC does not own the reasons. First, a large percentage of read operations involve 

requested cache line, processing continues to FIG. 5B as instruction fetches. During the execution of a sequence of 

indicating by Arrow 526. The request is provided to Bus 330 instructions, instruction execution is often re-directed by the 

as shown by Block 528. If the TLC 310 in the requesting 30 occurrence of a jump, skip, or other such instruction. Obtain- 

SLC's Sub-POD 210 does not have ownership of the ing a read copy of a cache line that is subsequently deter- 

rcquested cache line, the TLC must obtain ownership from mined to be unneeded because execution re-direction has 

the MSU 110. This is illustrated by Decision Block 530 and occurred can waste system resources. Thus, for many read 

Block 532. Then the TLC provides the requested data and situations, it is considered undesirable to obtain a prefetched 

ownership to the requesting SLC, as shown by Arrow 533 35 copy of the read data. Additionally, since a cache line 

and Block 534. Processing then continues to FIG. 5C as including a block of instructions should not, in most 

shown by Arrow 514 to be concluded in the manner dis- instances, undergo modification, it will not be exclusively 

cussed above that is required to maintain data coherency. owned by any cache in the system. Thus, even if the MSU 

If the TLC does own the requested cache line, it must be does not own a requested cache line, only read access has 

determined whether any other SLC in the Sub-POD has been 40 been provided by the MSU to other caches in the system. As 

granted ownership to this requested data. If the SLC on the a result, the MSU need not initiate a return operation to fetch 

same Bus 330 as the requesting SLC has been granted ownership and/or updated data, and a request for the cache 

ownership to the data and has a modified data copy, the data line may be processed without delay. Thus, a shorter access 

and ownership are provided by this SLC to the requesting time is generally associated with many read requests as 

SLC, as illustrated by Decision Block 536 and Block 538. 45 compared to the time required to complete the average write 

Processing then continues to FIG. 5C to be concluded in the request, making it less necessary to bury the read access 

manner discussed above, as shown by Arrows 540 and 514. times following a read miss to an SLC 360. 

Otherwise, if an SLC within the same Sub-POD 210 but While various embodiments of the present invention have 

located on the other Bus 330 from the requesting SLC has been described above, it should be understood that they have 

ownership of the cache line, ownership and any modified 50 been presented by way of example only, and not as a 

data is returned from this previous owner via the TLC 310 limitation. Thus, the breadth and scope of the present 

to the requesting SLC. This is shown in Decision Block 542 invention should not be limited by any of the above- 

and Block 544, respectively. Processing then continues to described exemplary embodiments, but should be defined 

FIG. 5C, as shown by Arrows 545 and 514. Finally, if no only in accordance with the following Claims and their 

other SLC in the Sub-POD has been granted ownership of 55 equivalents, 

the requested data, the data and ownership are provided by What is claimed is: 

the TLC 310, as shown by Arrow 546 and Block 547. Then 1. For use in a data processing system having a main 

processing continues to FIG. 5C to be concluded in the memory to store data items and a processor coupled to make 

manner discussed above, as shown by Arrow 514. requests to the main memory to read from or to write to 

The current system increases throughput in several ways. 60 selected ones of the data items, wherein the processor must 

First, two requests for ownership may be pending simulta- be granted ownership status by the main memory for a 

neously. As stated previously, exclusive ownership may requested one of the data items before the processor may 

have to be acquired by making a request to the MSU, which write to the requested one of the data items, a memory 

in turn, must make a request to another TLC. The time system, comprising: 

required to process the write requests may therefore be 65 first request logic to receive from the processor a first 

significant. The current invention allows two requests for request to write to a first selectable one of the data items 

ownership to be processed at once, so that request- stored in the main memory, and in response thereto, to 
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request that ownership status be granted by the main 
memory for said first selectable one of the data items; 
and 

second request logic coupled to said first request logic and 
to the processor to receive from the processor a second 5 
request to write to a second selectable one of the data 
items stored in the main memory, and in response 
thereto, and while said first request is still pending to 
the main memory, to request that ownership status be 
provided by the main memory for said second select- 10 
able one of the data items. 

2. The memory system of claim 1, and further including 
a cache memory coupled to said first request logic to receive 
said first one of the data items from the main memory, and 

to perform said first request and said second request after 15 
ownership of said first selectable one of the data items and 
ownership of said second selectable one of the data items 
have been granted, respectively. 

3. The memory system of claim 2, and further including 

a tag memory coupled to said cache memory to store status 20 
data signals to indicate whether ownership of any of the data 
items stored in the main memory has been granted to the 
processor. 

4. The memory system of claim 3, wherein said first 
request logic is coupled to said tag memory to determined 25 
whether said status data signals indicate ownership has 
already been granted for said first selectable one of the data 
items, and if so, to provide said first request directly to said 
cache memory without first requesting ownership from the 
main memory. 30 

5. The memory system of claim 4, wherein said second 
request logic is coupled to said tag memory to determined 
whether said status data signals indicate ownership has 
already been granted for said second selectable one of the 
data items, and if so, to provide said second request directly 35 
to said cache memory without first requesting ownership 
from the main memory. 

6. The memory system of claim 2, and further including 
a control circuit coupled to said first request logic and to said 
second request logic to ensure all requests issued by the 40 
processor to write to ones of the data items stored in the main 
memory are presented to said cache memory in the order in 
which said all requests are issued by the processor. 

7. The memory system of claim 6, and further including 

a storage device coupled to said second request logic to 45 
receive from the processor, and to temporarily store, mul- 
tiple pending requests to write to ones of the data items 
stored in the main memory, said multiple pending requests 
being temporarily stored if said first request logic and said 
second request logic already store valid ones of said requests 50 
to write to ones of the data items. 

8. The memory system of claim 1, and further including 
read request logic coupled to the processor to receive from 
the processor a read request to read a selectable one of the 
data items stored in the main memory, said read request 55 
logic to allow said read request to be processed before any 
pending request to write to a selectable one of the data items 
stored in the main memory. 

9. For use in a data processing system having a main 
memory and multiple processors coupled to the main 60 
memory each to issue requests to read from, and to write to, 
selectable portions in the main memory, the main memory 
including a directory memory to store status signals 
indicating, for each of the selectable portions, whether any 
respective one of the multiple processors has acquired 65 
ownership of the selectable portion such that modifications 
may be made to the selectable portion by the respective one 
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of the processors acquiring ownership, the data processing 
system further including write request logic respectively 
coupled to a selected one of the multiple processors, a 
method of processing requests provided by said multiple 
processors to access ones of the selectable portions of the 
main memory, comprising the steps of: 

a. ) receiving a first write request issued by the selected 
one of the multiple processors, said first write request 
requesting write access to a first one of the selectable 
portions, said first write request to be stored in the write 
request logic respectively coupled to said selected one 
of the multiple processors; 

b. ) receiving a second write request issued by said 
selected one of the multiple processors, said second 
write request requesting write access to a second one of 
the selectable portions, said second write request to be 
stored in said write request logic respectively coupled 
to said selected one of the multiple processors; 

c. ) issuing a first ownership, request from said write 

request logic respectively coupled to said selected one 
of the multiple processors, said first ownership request 
being issued to the main memory to obtain ownership 
for said first one of the selectable portions; and 

d. ) issuing a second ownership request from said write 
request logic respectively coupled to said selected one 
of the multiple processors, said second ownership 
request being issued to the main memory to obtain 
ownership for said second one of the selectable por- 
tions while said first ownership request is still pending 
to the main memory. 

10. The method of claim 9, wherein the data processing 
system further includes write request logic respectively 
coupled to multiple selected ones of the multiple processors, 
and further including the step of: 

e. ) performing steps a.)-d.) by each of said multiple 
selected ones of the multiple processors in the data 
processing system at the same time. 

11. The method of claim 9, wherein the data processing 
system further includes a cache memory coupled to the 
selected one of the multiple processors, and further includ- 
ing the steps of: 

e. ) receiving ownership from the main memory for said 

first one of the selectable portions; 

f . ) completing said first write request to the cache memory 

for said first one of the selectable portions; and 

g. ) repeating steps a.) and c.) for another write request 
issued by the selected one of the multiple processors. 

12. The method of claim 11, and further including the 
steps of: 

h. ) receiving ownership from the main memory for said 
second one of the selectable portions after ownership is 
received for said first one of the selectable portions; 

i. ) completing said second write request to the cache 

memory for said second one of the selectable portions; 
and 

j.) repeating steps b.) and d.) for another write request 
issued by the selected one of the multiple processors. 

13. The method of claim 11, and further including the 
steps of: 

h. ) receiving ownership from the main memory for said 
second one of the selectable portions before ownership 
is received for said first one of the selectable portions; 

i. ) waiting until ownership is received from the main 

memory for said first one of the selectable portions; 
j.) completing the write request to the cache memory for 
said first one of the selectable portions; 
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k.) completing the write request to the cache memory for 
said second one of the selectable portions after com- 
pleting the write request to the cache memory for said 
first one of the selectable portions; and 

1.) repeating steps a.)-d.) for two additional write requests 
issued by the selected one of the multiple processors. 

14. The method of claim 12, wherein the data processing 
system includes a storage device respectively coupled to the 
selected one of the multiple processors, and further includ- 
ing the steps of: 

storing any of said write requests received from the 
selected one of the multiple processors in the respec- 
tively coupled storage device if the write request logic 
has already stored said first and said second write 
requests; 

providing ones of the requests stored during said storing 
step to said write request logic during said steps a.) and 
b.) after processing has completed for said first and said 
second write requests. 

15. The method of claim 12, wherein the data processing 
system includes read request logic coupled to said selected 
one of the multiple processors, and further including the 
steps of: 

receiving a read request issued by the selected one of the 
multiple processors requesting read access to one of the 
selectable portions, said read request being issued after 
said first write request and said second write request 
were issued; and 

allowing said read request to be processed prior to com- 
pleting either of said first write request or said second 
write request. 

16. The method of claim 12, wherein the data processing 
system further includes a tag memory coupled to the cache 
memory to record, for each selectable portion of the main 
memory, whether ownership has already been granted to 
said selected one of the processors, and further including the 
steps of: 

reading the tag memory to determine whether ownership 
for said first selectable portion of the main memory has 40 
already been granted to said selected one of the pro- 
cessors; and 

skipping steps c.) and c.) if ownership for said first 
selectable portion of the main memory has already been 
granted to the selected one of the processors. 

17. The method of claim 16, and further including the 
steps of: 

reading the tag memory to determine whether ownership 
for said second selectable portion of the main memory 
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has already been granted to said selected one of the 
processors; and 
skipping steps d.) and h.) if ownership for said second 
selectable portion of the main memory has already been 
granted to the selected one of the processors. 

18. For use in a data processing system having a main 
memory to store data items and multiple processors coupled 
to make requests to the main memory to access ones of the 
data items, wherein any of the multiple processors must be 
granted ownership status before a predetermined type of the 
requests to the main memory may be completed, a system 
for handling memory requests, comprising: 

first request means for receiving from each of the multiple 
processors a respective first request of the predeter- 
mined type each requesting access to a respective first 
one of the data items, and for presenting each of said 
respective first requests to the main memory to gain 
ownership of each said respective first one of the data 
items if the respective requesting one of the multiple 
processors has not already obtained ownership status 
from the main memory; and 

second request means for receiving from each of the 
multiple processors a respective second request of the 
predetermined type each requesting access to a respec- 
tive second one of the data items, and for presenting 
each of said respective second requests to the main 
memory to gain ownership of each said respective 
second one of the data items if the respective requesting 
one of the multiple processors has not already obtained 
ownership status from the main memory, said second 
requests to be presented to the main memory while said 
first requests are still pending to the main memory. 

19. The system of claim 18, and further comprising cache 
means coupled to each of the multiple processors for tem- 
porarily storing ones of the data items retrieved from the 
main memory, and for processing each of said first requests 
after ownership for each of said respective first ones of the 
data items is obtained from the main memory, and' for 
processing each of said second requests after ownership for 
said respective second ones of the data items is obtained 
from the main memory. 

20. The system of claim 19, and further comprising 
control means coupled to said first request means and to said 
second request means for ensuring that multiple requests 
issued by any same one of the multiple processors are 
processed by said cache means in time-order regardless of 
the order in which ownership is granted by the main 
memory. 
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